Recovering dialect geography from an unaligned comparable corpus

نویسنده

  • Yves Scherrer
چکیده

This paper proposes a simple metric of dialect distance, based on the ratio between identical word pairs and cognate word pairs occurring in two texts. Different variations of this metric are tested on a corpus containing comparable texts from different Swiss German dialects and evaluated on the basis of spatial autocorrelation measures. The visualization of the results as cluster dendrograms shows that closely related dialects are reliably clustered together, while multidimensional scaling produces graphs that show high agreement with the geographic localization of the original texts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus-based Dialectometry: Aggregate Morphosyntactic Variability in British English Dialects

The research reported in this paper departs from most previous work in dialectometry in several ways. Empirically, it draws on frequency vectors derived from naturalistic corpus data and not on discrete atlas classifications. Linguistically, it is concerned with morphosyntactic (as opposed to lexical or pronunciational) variability. Methodologically, it marries the careful analysis of dialect p...

متن کامل

Globalization, Standardization, and Dialect Leveling in Iran

This paper is an attempt to shed light on the effects of modernization, urbanization, monolingual educational system, and mass media as well as the process of globalization on dialect leveling among Persian dialects. In so doing, the first part of the paper elaborates on the relationship between globalization and sociolinguistics, and on the concept of standardization. Also, it discusses some ...

متن کامل

Effective Factors on Naming Practices in Iran: Sociopolitics or Dialect?

Naming as an inseparable sign of a country’s language has attracted the attention of many linguists to formulate and test hypotheses regarding the culture and language of the people of a certain area. Iran appears like a proper destination for conducting a research focusing on naming based on several factors such as geography or chronology. The present article aims to take a specific look at th...

متن کامل

ACTIV-ES: a comparable, cross-dialect corpus of 'everyday' Spanish from Argentina, Mexico, and Spain

Corpus resources for Spanish have proved invaluable for a number of applications in a wide variety of fields. However, a majority of resources are based on formal, written language and/or are not built to model language variation between varieties of the Spanish language, despite the fact that most language in ‘everyday’ use is informal/ dialogue-based and shows rich regional variation. This pa...

متن کامل

Title 1 Visualization as a Research Tool for Dialect Geography Using a Geo-browser 2

22 Moving from a traditional dialect geography research methodology to one in which data are processed 23 electronically and where visualization is used as a research tool can be of great benefit to dialect geography. 24 A working environment offering full support for using visualization as a research tool could take dialect 25 geography into the era of eScience. Despite the advent of electroni...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012